Skip to content

gh-138004: fix threadmodule ascii and make thread naming test more lenient #138017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

jadonduff
Copy link

@jadonduff jadonduff commented Aug 21, 2025

Fallback to ASCII in _thread.set_name() when pthread_setname_np() rejects UTF-8

Summary

Issue: test_set_name fails on OpenIndiana (and Solaris descendants).
It seems that pthread_setname_np() only accepts ASCII-only names there:

>>> import _thread
>>> _thread.set_name('€')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Fix

1. _threadmodule.c

  • Attempt to encode the thread name in UTF-8 and call pthread_setname_np() as usual.
  • If the call fails with EINVAL, fall back to ASCII encoding using "replace" (non-ASCII → ?).
  • On macOS and other platforms that already accept UTF-8, the original UTF-8 succeeds and the fallback is never taken. ASCII names continue to work unchanged.
int rc;

// Fallback: If EINVAL, try ASCII encoding with "replace"
if (rc == EINVAL) {
    name_encoded = PyUnicode_AsEncodedString(name_obj, "ascii", "replace");
    if (name_encoded == NULL) {
        return NULL;
    }
#ifdef _PYTHREAD_NAME_MAXLEN
    if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
        PyObject *truncated;
        truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded),
                                              _PYTHREAD_NAME_MAXLEN);
        if (truncated == NULL) {
            Py_DECREF(name_encoded);
            return NULL;
        }
        Py_SETREF(name_encoded, truncated);
    }
#endif
    name = PyBytes_AS_STRING(name_encoded);
#ifdef __APPLE__
    rc = pthread_setname_np(name);
#elif defined(__NetBSD__)
    thread = pthread_self();
    rc = pthread_setname_np(thread, "%s", (void *)name);
#elif defined(HAVE_PTHREAD_SETNAME_NP)
    thread = pthread_self();
    rc = pthread_setname_np(thread, name);
#else /* defined(HAVE_PTHREAD_SET_NAME_NP) */
    thread = pthread_self();
    rc = 0;
    pthread_set_name_np(thread, name);
#endif
    Py_DECREF(name_encoded);
}

Explanation:

  • Keeps current behavior everywhere UTF-8 is supported.
  • Provides a fallback for platforms that only accept ASCII (e.g. OpenIndiana).
  • Avoids raising OSError(22) for non-ASCII names.

2. test_thread.py

  • Updated test_set_name to be lenient on platforms that reject non-ASCII names.
  • If OSError(22) occurs and the attempted name contained non-ASCII, the test is skipped instead of failing.
try:
    thread.start()
    thread.join()
    self.assertEqual(work_name, expected,
                     f"{len(work_name)=} and {len(expected)=}")
except OSError as exc:
    # Accept EINVAL (22) for non-ASCII names on platforms that do not support them
    if getattr(exc, 'errno', None) == 22 and any(ord(c) > 127 for c in name):
        self.skipTest(f"Platform does not support non-ASCII thread names: {exc}")
    else:
        raise

Explanation:

  • On macOS/Linux: runs as before, asserts the name round-trips correctly.
  • On OpenIndiana: skips the test if non-ASCII is not supported.

Verification

  • Ran locally on macOS (UTF-8 names continue to work as before).
  • Did not verify on OpenIndiana (cannot reproduce locally). Fix based on reports.
  • All tests passed on macOS ARM.

@python-cla-bot
Copy link

python-cla-bot bot commented Aug 21, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Aug 21, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Comment on lines 2632 to 2662
name_encoded = PyUnicode_AsEncodedString(name_obj, "ascii", "replace");
if (name_encoded == NULL) {
return NULL;
}
#ifdef _PYTHREAD_NAME_MAXLEN
if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
PyObject *truncated;
truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded),
_PYTHREAD_NAME_MAXLEN);
if (truncated == NULL) {
Py_DECREF(name_encoded);
return NULL;
}
Py_SETREF(name_encoded, truncated);
}
#endif
name = PyBytes_AS_STRING(name_encoded);
#ifdef __APPLE__
rc = pthread_setname_np(name);
#elif defined(__NetBSD__)
thread = pthread_self();
rc = pthread_setname_np(thread, "%s", (void *)name);
#elif defined(HAVE_PTHREAD_SETNAME_NP)
thread = pthread_self();
rc = pthread_setname_np(thread, name);
#else /* defined(HAVE_PTHREAD_SET_NAME_NP) */
thread = pthread_self();
rc = 0;
pthread_set_name_np(thread, name);
#endif
Py_DECREF(name_encoded);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not duplicate such complex code. Move it to function and reuse. Or use a loop (but this may be more complicated).

Copy link
Author

@jadonduff jadonduff Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved to a function and reused. Despite most tests passing, I am receiving the following for the "Check if generated files are up to date" test:

 /usr/bin/ld: Modules/_threadmodule.o: in function `_thread_set_name_impl':
/home/runner/work/cpython/cpython/./Modules/_threadmodule.c:2617:(.text+0x2ed5): undefined reference to `_set_thread_name'
/usr/bin/ld: /home/runner/work/cpython/cpython/./Modules/_threadmodule.c:2637:(.text+0x2f99): undefined reference to `_set_thread_name'
collect2: error: ld returned 1 exit status
make: *** [Makefile:1927: Programs/_freeze_module] Error 1
Error: Process completed with exit code 2.

Please let me know if you see the issue. It appeared to be working before changing to function.

thread.join()
self.assertEqual(work_name, expected,
f"{len(work_name)=} and {len(expected)=}")
except OSError as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OSError even raised here? The test failure was different -- that work_name was unexpectedly empty.

Copy link
Author

@jadonduff jadonduff Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

                    if any(ord(c) > 127 for c in name) and (not work_name or work_name == ""):
                        self.skipTest(f"Platform does not support non-ASCII thread names: got empty name for {name!r}")
                    self.assertEqual(work_name, expected,
                                     f"{len(work_name)=} and {len(expected)=}")

Kept OSError fallback because it was shown in original issue, but can remove if needed.

>>> import _thread
>>> _thread.set_name('€')
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    _thread.set_name('€')
    ~~~~~~~~~~~~~~~~^^^^^
OSError: [Errno 22] Invalid argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was shown that _thread.set_name() fails, but that OSError is not leaked from the Thread code.

@bedevere-app
Copy link

bedevere-app bot commented Aug 21, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@jadonduff
Copy link
Author

Please see comments regarding the requested changes. OS tests passed, but a function not found error occurred in "Tests / Check if generated files are up to date (pull_request)", which I am having trouble fixing. Apologies - this is my first attempt at contributing to Python.

@jadonduff
Copy link
Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Aug 22, 2025

Thanks for making the requested changes!

@serhiy-storchaka: please review the changes made to this pull request.

Comment on lines 2651 to 2660
#ifdef _PYTHREAD_NAME_MAXLEN
if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
PyObject *truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded), _PYTHREAD_NAME_MAXLEN);
if (truncated == NULL) {
Py_DECREF(name_encoded);
return NULL;
}
Py_SETREF(name_encoded, truncated);
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, please try to avoid duplicating code. This should be factored out into its own function. Here's an outline:

static PyObject *
get_truncated(PyObject *name_encoded /* stolen */)
{
#ifdef _PYTHREAD_NAME_MAXLEN
    if (PyBytes_GET_SIZE(name_encoded) > _PYTHREAD_NAME_MAXLEN) {
        PyObject *truncated = PyBytes_FromStringAndSize(PyBytes_AS_STRING(name_encoded), _PYTHREAD_NAME_MAXLEN);
        if (truncated == NULL) {
            Py_DECREF(name_encoded);
            return NULL;
        }
        Py_SETREF(name_encoded, truncated);
    }
#endif
    return name_encoded;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply include the encoding and truncating code in _set_thread_name(). BTW, there is no need to use underscored names here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created function encode_thread_name and used to replace duplicated code

@@ -2241,6 +2241,7 @@ def __init__(self, a, *, b) -> None:

with warnings.catch_warnings(record=True) as warnings_log:
CustomRLock(1, b=2)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray newline change:

Suggested change

jadonduff and others added 3 commits August 22, 2025 13:28
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
@jadonduff
Copy link
Author

Pulled _threadmodule.c from main and re-wrote functions. Added two functions - set_native_thread_name and encode_thread_name (using ZeroIntensity's outline). Also removed OSError exception handing in test_threading.py (serhiy-storchaka)

Comment on lines 2646 to 2651
name_encoded = encode_thread_name(name_obj, "ascii");
if (name_encoded == NULL) {
return NULL;
}
name = PyBytes_AS_STRING(name_encoded);
rc = set_native_thread_name(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can merge all this (and the following Py_DECREF(name_encoded)) in a single function.

@@ -2360,6 +2360,9 @@ def work():
thread = threading.Thread(target=work, name=name)
thread.start()
thread.join()
# If the name is non-ASCII and the result is empty, skip (platform limitation)
if any(ord(c) > 127 for c in name) and (not work_name or work_name == ""):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the isascii() method. And why not work_name or work_name == ""?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used isascii() method, and changed to:

# If the name is non-ASCII and the result is empty, skip (platform limitation)
if not name.isascii() and not work_name:
    self.skipTest(f"Platform does not support non-ASCII thread names: got empty name for {name!r}")
self.assertEqual(work_name, expected,
                 f"{len(work_name)=} and {len(expected)=}")

Comment on lines -2584 to -2589
#ifdef __sun
// Solaris always uses UTF-8
const char *encoding = "utf-8";
#else
// Encode the thread name to the filesystem encoding using the "replace"
// error handler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to encode_thread_name

@@ -2894,4 +2924,4 @@ PyMODINIT_FUNC
PyInit__thread(void)
{
return PyModuleDef_Init(&thread_module);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change (newline?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May have accidentally added a newline in a previous commit. It now matches the current _threadmodule.c in the main branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

Py_RETURN_NONE;
#else
// Windows implementation
assert(pSetThreadDescription != NULL);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those blank lines could be kept as this part of the code is not touched.

jadonduff and others added 3 commits August 22, 2025 16:58
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@@ -2360,6 +2361,9 @@ def work():
thread = threading.Thread(target=work, name=name)
thread.start()
thread.join()
# If the name is non-ASCII and the result is empty, skip (platform limitation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not repeat the code in a comment.

@@ -2894,4 +2924,4 @@ PyMODINIT_FUNC
PyInit__thread(void)
{
return PyModuleDef_Init(&thread_module);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change.

if (rc) {
errno = rc;
int err = rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it was needed to introduce the err variable?

Comment on lines +116 to +117
Py_DECREF(name_encoded);
return truncated;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code.

Comment on lines +127 to +134
PyObject *name_encoded = encode_thread_name(name_obj, encoding);
if (name_encoded == NULL) {
return -1; // error, exception set
}
const char *name = PyBytes_AS_STRING(name_encoded);
int rc = set_native_thread_name(name);
Py_DECREF(name_encoded);
return rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can now inline set_native_thread_name() and encode_thread_name() as it was in the original code.

{
PyObject *name_encoded = encode_thread_name(name_obj, encoding);
if (name_encoded == NULL) {
return -1; // error, exception set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens at the caller place when set_thread_name_with_encoding() returns -1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants